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This paper introduces a new HDTV coder based 
on motion compensation, subband coding, and high 
order conditional entropy coding. The proposed coder 
exploits the temporal and spatial statistical dependen- 
cies inherent in the HDTV signal by using intra- and 
inter-subband conditioning for coding both the motion 
coordinates and the residual signal. The new frame- 
work provides an easy way to control the system com- 
plexity and performance, and inherently supports mul- 
tiresolution transmission. Experimental results show 
that the coder outperforms MPEG-2, while still main- 
taining relatively low complexity. 

I. INTRODUCTION 

Several methods have been proposed recently for trans- 
mission of HDTV [1, 2, 3, 4, 5, 6, 7]. Most employ mo- 
tion compensation at one stage or another, after which the 
residual between the original and predicted frames is com- 
puted and encoded spatially. DCT-based spatial coders 
are widely used, most notably in the MPEG standards. 
However, subband coders are also becoming popular. 

There are many important issues that are associated 
with HDTV coding, such as control over the bit rate and 
picture quality, error correction and concealment, and mul- 
tiresolution capability for multisource decoding and pro- 
gressive transmission applications. In this paper, we intro- 
duce a new subband video coder which achieves good per- 
formance with low relative complexity, but also provides 
a framework where most of these issues can be easily ad- 
dressed. The proposed coder employs motion estimation 
and compensation independently for each subband, but 
encodes the motion vectors using a high order conditional 
entropy coding scheme that exploits statistical dependen- 
cies between motion vectors of the same frame and suc- 
cessive frames as well as between the coordinates of the 
motion vectors, simultaneously. The coder also identifies 
non-compensatable blocks through the use of statistically 
optimized thresholding, which are then intra-frame coded. 
The video coder is described next. This is followed by a 
discussion of practical design issues. Section 4 presents 
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experimental results which compare the performance and 
complexity of the coder with that of MPEG-2. 

II. THE VIDEO CODER 

First, consider a conventional subband video coder. In 
the parlance of MPEG, the frames that are coded spatially 
are called I frames. Those that are forward-predicted are 
called P frames, and those that are forward- and backward- 
predicted are called B frames. The sequence of video 
frames is first grouped into blocks of N frames, where 
the first frame (or I frame) is coded using an intra-frame 
subband coder, and the other N-l frames (or P frames) 
are predicted using motion estimation and compensation, 
and the residual frames are coded using another subband 
coder. In this work, no B frames are used. At the receiver, 
each video frame is constructed from motion information 
(if applicable) and the coded residual frame. 

Th°re are two important problems associated with the 
above coder. First, motion compensation using the block 
matching algorithm with a typical block size of 16 x 16 
and search range of — 16 -to- -1-16 in each dimension is 
usually computationally intensive. This problem becomes 
even worse in HDTV coding because both block sizes and 
search areas have to be somewhat larger to achieve good 
performance. Second, due to the block matching algo- 
rithm, blockiness frequently appears in the residual frame, 
which introduces artificial high frequencies. To solve these 
two problems, we apply the block matching algorithm to 
each of the subbands. Figure 1 shows a block diagram of 
the proposed 6ubband coder and Figure 2 shows the struc- 
ture of the RVQ coder. Each frame is first decomposed 
into subbands using a tree-structured IIR analysis filter 
bank. The filter bank is based on two-band decomposi- 
tions, which employ allpass polyphase separable IIR filters 
[8]. A full-search block matching algorithm (BMA) using 
the mean absolute distance (MAD) is used to estimate the 
motion vectors. Since the BMA does not necessarily pro- 
duce the true motion vectors, we employ a thresholding 
technique for improving the rate-distortion performance. 
Let d m in be the minimum MAD associated with a block 
to be coded. Also, let T be a threshold, which is a large 
positive number empirically determined from the statis- 
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Figure 1: Proposed subband video coder. 



Figure 2: Basic structure of the RVQ coder. 

tics of the subband being coded. If > T , then the 
block is likely not compensatable. Thus, both the origi- 
nal block and the residual block, obtained by subtracting 
the motion compensated predicted block from the original 
one, are coded using the intra-band and residual coders, 
respectively, and the one leading to better rate-distortion 
performance is chosen (as will be described shortly). A 
special symbol, which can be coded as part of the motion 
information, is sent to the decoder indicating the type of 
coding used. 

In many conventional HDTV subband coders as well 
as in MPEG, differential entropy coding of motion vec- 
tors is employed. Since motion vectors are usually slowly 
varying, the motion bit rate can be further reduced by 
exploiting dependencies not only between previous mo- 
tion vectors within and across the subbands but also be- 
tween the vector coordinates. For this purpose, we em- 
ploy a high order conditional entropy coder that is based 
on finite state machine (FSM) modeling. More fapecif- 
ically, let (X„ >m ,y„,m) be the pair of random variables 
representing the current horizontal and vertical motion 
displacements in the current subband (n, m) in frame n. 
Also, let V«, m ) be the pair of state random vari- 
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previously coded conditioning symbols. The mappings 
F n m and G n ,m are generally many-to-one mappings that 
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symbols to a particular state. Assuming that X n ,m is en- 
tropy coded first, the conditioning symbols for the FSM 
model associated with X„, m are selected from a region 
composed of symbols located in all previously coded sub- 
bands (i.e., where motion vectors were already coded) in 
both frames n and n — 1. When Y n ,m is being coded, the 
horizontal displacements in the same subband can also be 
included in the conditioning region. 


Statistical modeling for entropy coding the motion vec- 
tors consists of first selecting, for each subband (n,m), 
M„ t m (Nn,m) conditioning symbols for X n ,m On,m)> and 
then finding mappings F„, m and G n , m such that the con- 
ditional entropies H(X n ,m\U n ,m) and H(X n ,m\Un,m) are 
minimized subject to a limit on complexity. The total 
number of probabilities that must be computed and stored 
is used here as a measure of complexity. The tree-based al- 
gorithms described in [9] are used to find the best values of 
M n m and N n ,m subject to a limit C\ on the total number 
of probabilities. The PNN algorithm [10], in conjunction 
with the generalized BFOS algorithm [1 1] , is then used to 
construct mapping tables that represent F n ,m and G n ,m 
subject to another limit C 2 ( C 2 << Ci) on the number of 
probabilities. 

The intra-band (I-subband) and residual (P-subband) 
coders are multistage residual vectors quantizers (RVQs) 
followed with high order conditional statistical models, 
which are optimized to the intra-band and residual band 
statistics, respectively. Multistage RVQs provide an easy 
way to control the complexity-performance tradeoffs, and 
allow efficient high order statistical modeling. We restrict 
the number of code vectors per stage to be 2, which sim- 
plifies both statistical modeling and entropy coding used 
in this work. This also provides the highest resolution in 
a progressive transmission environment. 

The same statistical modeling algorithm used for en- 
tropy coding the motion vectors is also used for entropy 
coding of the output of the RVQs. Both the motion vec- 
tors and the output of the RVQs are eventually coded 
using adaptive binary arithmetic coders (BACs) [12, 13]. 
These coders are very easy to adapt and require small 
complexity. 











HI. PRACTICAL DESIGN ISSUES 

To achieve the lowest bit rate, the statistical models 
used to entropy code the motion vectors should be gen- 
erated on-line. However, this requires a two-pass process 
where statistics are generated in the first pass, and the 
statistical modeling algorithm described above is used to 
generate the conditional probabilities. These probabilities 
must then be sent to the BAC decoders so that they can 
track the corresponding encoders. In most cases, this re- 
quires a large complexity. Moreover, even by restricting 
the number of states to be relatively small (such as 8), the 
side information can be excessive. Therefore, we choose 
to initialize the encoder with a generic statistical model, 
which we generate using a training HDTV sequence, and 
then employ dynamic adaptation [12] to track the local 
statistics of the motion flow. 

For both the I-subbands and P-subbands, the multi- 
stage RVQs and associated statistical models are designed 
jointly using an entropy and complexity-constrained algo- 
rithm, which is described in [9, 14]. The design algorithm 
iteratively minimizes the expected distortion E{d(X,X)) 
subject to a constraint on the overall entropy of the sta- 
tistical models. The algorithm is based on a Lagrangian 
minimization and employs a Lagrangian parameter A to 
control the rate-distortion tradeoffs. To substantially re- 
duce the complexity of the design algorithm, only sepa- 
rate subband encoders and decoders are used. However, 
the RVQ stage encoders in each subband are jointly opti- 
mized through dynamic M- search, the decoders are jointly 
optimized using the Gauss-Seidel algorithm. 

The most important part of the design algorithm is 
the encoding procedure, where either an intra-frame or 
inter-frame subband coder must be chosen for a particu- 
lar block. Suppose we want to encode a block B' n m of size 
L n m using the proposed I-subband and P-subband coders 
with Lagrangian parameters (or quality factors) A j and 
Xp, respectively. The BMA algorithm is first applied, and 
the minimum MAD d m »n is computed. If d mtn < T, then 
the corresponding motion vector is encoded using the BAC 
specified by the current state, and the residual block is 
quantized using the P-subband (residual) RVQ. The out- 
put of each RVQ stage is encoded with a separate entropy 
coder composed of a FSM statistical model and a set of 
BACs, each specified uniquely by a state. If dmin ~ T > 
then the block is both I-subband and P-subband coded. 
Let Rx = — log 2 p(i*l«*) &ad Ry = — logjP^lv') be esti- 
mates of the number of bits required to code the horizontal 
and vertical coordinates of the motion vector, respectively. 
Also, let dp be the distortion and Rp be the rate that 
compose the minimum Lagrangian Jp = dp 4- A pRp as- 
sociated with coding the residual block. Assuming that 
Jj ~ dj 4- XiRi is the minimum Lagrangian associated 
with coding the original block, then the I-subband coding 



Figure 3: The 114th frame of the wtquenee BRITS 
method is selected if 

Jj < dp 4* Xp(R x Ry 4- Rp)- 

The proposed coder has many practical advantages, 
due to both the subband structure and the multistage 
structure of RVQ. For example, multiresolution transmis- 
sion can be easily implemented in such a framework. An- 
other example is error correction, where the more probable 
of the two stage code vectors is selected if an uncorrectable 
error is detected. Since each stage code vector represents 
only a small part of the coded vector, this should not sig- 
nificantly affect the reconstruction or the FSM statistical 
models. 

IV. EXPERIMENTAL RESULTS 

The image shown in Figure 3 is frame number 114 of 
the test sequence BRITS, which we encode using both the 
proposed coder and MPEG-2. The frame size is 720x1280. 
The original RGB color sequence with 8 bits/pixel requires 
approx 1.3 Gbs. The MPEG-2 software we used resides 
on ftp.netcom.com:/pub/cfogg/mpegS [15]. 

In our experiments, each frame is decomposed into 64 
uniform subbands, but more than half of the subbands 
are not coded. This is determined based on initial rate- 
distortion tradeoffs [9]. The BMA algorithm used in our 
experiments employs a block size of 2 x 2 and a search 
area of —2 -to- 4-2 in each dimension. Motion estima- 
tion is performed, and is done only for the Y luminance 
component and the estimated motion vector field is sub- 
sequently used for the motion compensation of U and V 
chrominance signals. A high order conditional entropy 
coder is designed for the motion vector coordinates, and 
one I-subband coder and one P-subband coder with vector 
size of 2 x 2 are designed for the each of the YUV com- 
ponents. We set the maximum allowed numbers of condi- 
tional probabilities for the motion entropy coder and the 
I-subband and P-subband entropy coders to C\ = 4094 
and Ci = 512. The BACs used employ a skew factor 
between 1 and 256. 
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Figure 4: (a) Overall rate usage and (b) PSNR performance for the 
proposed coder. 


For each rate-distortion point, the total memory re- 
quired to store both the I-subband and P-subband RVQ 
codebooks and associated tables of conditional entropy 
codes is approximately 4.6 kilobytes. Moreover, only 512 
bytes are required by the motion entropy coder. For anal- 
ysis, quantization using dynamic M-search, and BAC en- 
coding, approximately 27 multiplies and 32 adds per pixel 
are required. Only 3 multiplies and 14 adds are required 
for BAC decoding, inverse quantization, and synthesis. 
Not only are the encoding complexity and memory rela- 
tively small, but the performance is also good. Figure 4 (a) 
shows the average bit per pixel and Figure 4 (b) shows the 
PSNR result of our coder in comparison with the MPEG- 
2 standard for 10 frames of the luminance component of 
the color test video sequence BRITS. The average bit rate 
is approximately 18.0 Mbits/sec and the average PSNR 
is 34.75 dB for the proposed subband coder and 33.70 
dB for MPEG-2. As is shown in the figure, the proposed 
coder clearly outperforms MPEG-2. Moreover, although 
MPEG-2 requires less encoding complexity and memory, 
the complexity of our subband coder are still reasonable. 
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